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(54) Adaptive codebook-based speech compression system 



(57) A speech coding system employing an adap- 
tive codebook model of periodicity is augmented with a 
pitch-predictive filter (PPF). This PPF has a delay equal 
to the integer component of the pitch-period and a gain 
which is adaptive based on a measure of periodicity of 
the speech signal. In accordance with an embodiment 
of the present invention, speech processing systems 
which include a first portion comprising an adaptive 
codebook and corresponding adaptive codebook ampli- 
fier and a second portion comprising a fixed codebook 
coupled to a pitch filter, are adapted to delay the adap- 
tive codebook gain; determine the pitch filter gain based 
on the delayed adaptive codebook gain, and amplify 
samples of a signal in the pitch filter based on said 
determined pitch filter gain. The adaptive codebook 
gain is delayed for one subframe. The pitch filter gain 
equals the delayed adaptive codebook gain, except 
when the adaptive codebook gain is either less than 0.2 
or greater than 0.8., in which cases the pitch filter gain 
is set equal to 0.2 or 0.8. respectively. 
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Descripti n 

Field of the Invention 

5 The present invention relates generally to adaptive codebook-based speech compression systems, and more par- 

ticularly to such systems operating to compress speech having a pitch-period less than or equal to adaptive codebook 
vector (subframe) length. 

Background of the Invention 

10 

Many speech compression systems employ a subsystem to model the periodicity of a speech signal. Two such 
periodicity models in wide use in speech compression (or coding) systems are the pitch prediction filter (PPF) and the 
adaptive codebook (ACB). 

The ACB is fundamentally a memory which stores samples of past speech signals, or derivatives thereof such as 
15 speech residual or excitation signals (hereafter speech signals). Periodicity is introduced (or modeled) by copying sam- 
ples from the past (as stored in the memory) speech signal into the present to "predict" what the present speech signal 
will look like. 

The PPF is a simple II R filter which is typically of the form 
20 y(n)=x(n) + g p y(n-M) (1) 

where n is a sample index, y is the output, x is the input, M is a delay value of the filter, and g p is a scale factor (or gain). 
Because the current output of the PPF is dependent on a past output, periodicity is introduced by the PPF. 

Although either the ACB or PPF can be used in speech coding, these periodicity models do not operate identically 

25 under all circumstances. For example, while a PPF and an ACB will yield the same results when the pitch-period of 
voiced speech is greater than or equal to the subframe (or codebook vector) size, this is not the case if the pitch-period 
is less than the subframe size. This difference is illustrated by Figures 1 and 2. where it is assumed that the pitch-period 
(or delay) is 2.5 ms, but the subframe size is 5 ms. 

Figure 1 presents a conventional combination of a fixed codebook (FCB) and an ACB as used in a typical CELP 

30 speech compression system (this combination is used in both the encoder and decoder of the CELP system). As shown 
in the Figure, FCB 1 receives an index value, I, which causes the FCB to output a speech signal (excitation) vector of a 
predetermined duration. This duration is referred to as a subframe (here, 5 ms.). Illustratively, this speech excitation sig- 
nal will consist of one or more main pulses located in the subframe. For purposes of clarity of presentation, the output 
vector will be assumed to have a single large pulse of unit magnitude. The output vector is scaled by a gain, g c , applied 

35 by amplifier 5. 

In parallel with the operation of the FCB 1 and gain 5, ACB 10 generates a speech signal based on previously syn- 
thesized speech. In a conventional fashion, the ACB 10 searches its memory of past speech for samples of speech 
which most closely match the original speech being coded. Such samples are in the neighborhood of one pitch-period 
(M) in the past from the present sample it is attempting to synthesize. Such past speech samples may not exist if the 

40 pitch is fractional; they may have to be synthesized by the ACB from surrounding speech sample values by linear inter- 
polation, as is conventional. The ACB uses a past sample identified (or synthesized) in this way as the current sample. 
For clarity of explanation, the balance of this discussion will assume that the pitch-period is an integral multiple of the 
sample period and that past samples are identified by M for copying into the present subframe. The ACB outputs indi- 
vidual samples in this manner for the entire subframe (5 ms.). All samples produced by the ACB are scaled by a gain, 

45 g p , applied by amplifier 15. 

For current samples in the second half of the subframe, the "past" samples used as the "current" samples are those 
samples in the first half of the subframe. This is because the subframe is 5 ms in duration, but the pitch-period, M, -- 
the time period used to identify past samples to use as current samples - is 2.5 ms. Therefore, if the current sample to 
be synthesized is at the 4 ms point in the subframe, the past sample of speech is at the 4 ms -2.5 ms or 1 .5 ms point in 

so the same subframe. 

The output signals of the FCB and ACB amplifiers 5, 15 are summed at summing circuit 20 to yield an excitation 
signal for a conventional linear predictive (LPC) synthesis filter (not shown). A stylized representation of one subframe 
of this excitation signal produced by circuit 20 is also shown in Figure 1 . Assuming pulses of unit magnitudes before 
scaling, the system of codebooks yields several pulses in the 5 ms subframe. A first pulse of height g p a second pulse 
55 of height g c , and a third pulse of height g^ The third pulse is simply a copy of the first pulse created by the ACB. Note 
that there is no copy of the second pulse in the second half of the subframe since the ACB memory does not include 
the second pulse (and the fixed codebook has but one pulse per subframe). 

Figure 2 presents a periodicity model comprising a FCB 25 in series with a PPF 50. The PPF 50 comprises a sum- 
ming circuit 45, a delay memory 35, and an amplifier 40. As with the system discussed above, an index, I, applied to 
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the FCB 25 causes the FCB to output an excitation vector corresponding to the index. This vector has one major pulse. 
The vector is scaled by amplifier 30 which applies gain g c . The scaled vector is then applied to the PPF 50. PPF 50 
operates according to equation (1) above. A stylized representation of one subframe of PPF 50 output signal is also pre- 
sented in Figure 2. The first pulse of the PPF output subframe is the result of a delay, M, applied to a major pulse 

5 (assumed to have unit amplitude) from the previous subframe (not shown). The next pulse in the subframe is a pulse 
contained in the FCB output vector scaled by amplifier 30. Then, due to the delay 35 of 2.5 ms, these two pulses are 
repeated 2.5 ms later, respectively, scaled by amplifier 40. 

There arejmajor differences between the output signals of the ACB and PPF implementations of the periodicity 
model. They manifest themselves in the later half of the synthesized subframes depicted in Figures 1 and 2. First, the 

io amplitudes of the third pulses are different - g p as compared with g p 2 . Second, there is no fourth pulse in output of the 
ACB model. Regarding this missing pulse, when the pitch-period is less than the frame size, the combination of an ACB 
and a FCB will not introduce a second fixed codebook contribution in the subframe. This is unlike the operation of a 
pitch prediction filter in series with a fixed codebook. 

is Summary of the Invention 

For those speech coding systems which employ an ACB model of periodicity, it has been proposed that a PPF be 
used at the output of the FCB. This PPF has a delay equal to the integer component of the pitch-period and a fixed gain 
of 0.8. The PPF does accomplish the insertion of the missing FCB pulse in the subframe, but with a gain value which 
20 is speculative. The reason the gain is speculative is that joint quantization of the ACB and FCB gains prevents the deter- 
mination of an ACB gain for the current subframe until both ACB and FCB vectors have been determined. 

The inventor of the present invention has recognized that the fixed-gain aspect of the pitch loop added to an ACB 
based synthesizer results in synthesized speech which is too periodic at times, resulting in an unnatural "buzzyness" of 
the synthesized speech. 

25 The present invention solves a shortcoming of the proposed use of a PPF at the output of the FCB in systems 
which employ an ACB. The present invention provides a gain for the PPF which is not fixed, but adaptive based on a 
measure of periodicity of the speech signal. The adaptive PPF gain enhances PPF performance in that the gain is small 
when the speech signal is not very periodic and large when the speech signal is highly periodic. This adaptability avoids 
the t *xizzyness" -problem. 

30 in accordance with an embodiment of the present invention, speech processing systems which include a first por- 
tion comprising an adaptive codebook and corresponding adaptive codebook amplifier and a second portion compris- 
ing a fixed codebook coupled to a pitch filter, are adapted to delay the adaptive codebook gain; determine the pitch filter 
gain based on the delayed adaptive codebook gain, and amplify samples of a signal in the pitch filter based on said 
determined pitch filter gain. The adaptive codebook gain is delayed for one subframe. The delayed gain is used since 

35 the quantized gain for the adaptive codebook is not available until the fixed codebook gain is determined. The pitch fitter 
gain equals the delayed adaptive codebook gain, except when the adaptive codebook gain is either less than 0.2 or 
greater than 0.8. , in which cases the pitch filter gain is set equal to 0.2 or 0.8, respectively. The limits are there to limit 
perceptually undesirable effects due to errors in estimating how periodic the excitation signal actually is. 

40 prief Description of the Drawings 

Figure 1 presents a conventional combination of FCB and ACB systems as used in a typical CELP speech com- 
pression system, as well as a stylized representation of one subframe of an excitation signal generated by the combi- 
nation. 

45 Figure 2 presents a periodicity model comprising a FCB and a PPF, as well as a stylized representation of one sub- 
frame of PPF output signal. 

Figure 3 presents an illustrative embodiment of a speech encoder in accordance with the present invention. 
Figure 4 presents an illustrative embodiment of a decoder in accordance with the present invention. 

so Detailed Description 

I. Introduction to the Illustrative Embodiments 

For clarity of explanation, the illustrative embodiments of the present invention is presented as comprising individ- 
55 ual functional blocks (including functional blocks labeled as "processors"). The functions these blocks represent may be 
provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of exe- 
cuting software. For example, the functions of processors presented in Figure 3 and 4 may be provided by a single 
shared processor. (Use of the term "processor* should not be construed to refer exclusively to hardware capable of exe- 
cuting software.) 
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Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP16 or 
DSP32C, read-only memory (ROM) for storing software performing the operations discussed below, and random 
access memory (RAM) for storing DSP results. Very large scale integration (VLSI) hardware embodiments, as well as 
custom VLSI circuitry in combination with a general purpose DSP circuit, may also be provided. 

5 The embodiments described below are suitable for use in many speech compression systems such as, for exam- 

ple, that described in a preliminary Draft Recommendation G.729 to the ITU Standards Body (G.729 Draft), which has 
been attached hereto as an Appendix. This speech compression system operates at 8 kbit/s and is based on Code- 
Excited Linear-Predictive (CELP) coding. See G.729 Draft Section 2. This draft recommendation includes a complete 
description of the speech coding system, as well as the use of the present invention therein. See generally, for example, 

10 figure 2 and the discussion at section 2.1 of the G.729 Draft. With respect to the an embodiment of present invention, 
see the discussion at sections 3.8 and 4.1 .2 of the G.729 Draft. 

II. The Illustrative Embodiments 

15 Figures 3 and 4 present illustrative embodiments of the present invention as used in the encoder and decoder of 
the G.729 Draft. Figure 3 is a modified version of figure 2 from the G.729 Draft which has been augmented to show the 
detail of the illustrative encoder embodiment. Figure 4 is similar to figure 3 of G.729 Draft augmented to show the details 
of the illustrative decoder embodiment. In the discussion which follows, reference will be made to sections of the G.729 
Draft where appropriate. A general description of the encoder of the G.279 Draft is presented at section 2.1, while a 

20 general description of the decoder is presented at section 2.2. 

A. The Encoder 

In accordance with the embodiment, an input speech signal (16 bit PCM at 8 kHz sampling rate) is provided to a 

25 preprocessor 100. Preprocessor 100 high-pass filters the speech signal to remove undesirable low frequency compo- 
nents and scales the speech signal to avoid processing overflow. See G.729 Draft Section 3.1. The preprocessed 
speech signal. s(n), is then provided to linear prediction analyzer 105. See G.729 Draft Section 3.2. Linear prediction 
(LP) coefficients, a j( are provided to LP synthesis filter 1 55 which receives an excitation signal. u(n), formed of the com- 
bined output of FCB and ACB portions of the encoder. The excitation signal is chosen by using an analysis-by-synthesis 

30 search procedure in which the error between the original and synthesized speech is minimized according to a percep- 
tually weighted distortion measure by perceptual weighting filter 165. See G.729 Draft Section 3.3. 

Regarding the ACB portion 112 of the embodiment, a signal representing the perceptually weighted distortion 
(error) is used by pitch period processor 1 70 to determine an open-loop pitch-period (delay) used by the adaptive code- 
book system 1 1 0. The encoder uses the determined open-loop pitch-period as the basis of a closed-loop pitch search. 

35 ACB 1 1 0 computes an adaptive codebook vector, v(n) f by interpolating the past excitation at a selected fractional pitch. 
See G.729 Draft Sections 3.4-3.7. The adaptive codebook gain amplifier 1 1 5 applies a scale factor g p to the output of 
the ACB system 110. See G.729 Draft Section 3.9.2. 

Regarding the FCB portion 1 18 of the embodiment, an index generated by the mean squared error (MSE) search 
processor 175 is received by the FCB system 120 and a codebook vector, c(n), is generated in response. See G.729 

40 Draft Section 3.8. This codebook vector is provided to the PPF system 128 operating in accordance with the present 
invention (see discussion below). The output of the PPF system 128 is scaled by FCB amplifier 145 which applies a 
scale factor g c . Scale factor g c is determined in accordance with G.729 Draft section 3.9. 

The vectors output from the ACB and FCB portions 1 12, 1 18 of the encoder are summed at summer 150 and pro- 
vided to the LP synthesis filter as discussed above. 

45 

B. The PPF System 

As mentioned above, the PPF system addresses the shortcoming of the ACB system exhibited when the pitch- 
period of the speech being synthesized is less than the size of the subframe and the fixed PPF gain is too large for 
so speech which is not very periodic. 

PPF system 128 includes a switch 126 which controls whether the PPF 128 contributes to the excitation signal. If 
the delay. M, is less than the size of the subframe, L, than the switch 126 is closed and PPF 128 contributes to the exci- 
tation. If M ^ L, switch 126 is open and the PPF 128 does not contribute to the excitation. A switch control signal K is 
set when M < L. Note that use of switch 126 is merely illustrative. Many alternative designs are possible, including, for 
55 example, a switch which is used to by-pass PPF 128 entirely when M ^ L 

The delay used by the PPF system is the integer portion of the pitch-period, M, as computed by pitch-period proc- 
essor 170. The memory of delay processor 135 is cleared prior to PPF 128 operation on each subframe. The gain 
applied by the PPF system is provided by delay processor 125. Processor 125 receives the ACB gain, g p> and stores 
it for one subframe (one subframe delay). The stored gain value is then compared with upper and lower limits of 0.8 and 
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0.2, respectively. Should the stored value of the gain be either greater than the upper limit or less than the lower limit, 
the gain is set to the respective limit. In other words, the PPF gain is limited to a range of values greater than or equal 
to 0.2 and less than or equal to 0.8. Within that range, the gain may assume the value of the delayed adaptive codebook 
gain. 

5 The upper and lower limits are placed on the value of the adaptive PPF gain so that the synthesized signal is nei- 

ther overperiodic or aperiodic, which are both perceptually undesirable. As such, extremely small or large values of the 
ACB gain should be avoided. 

It will be apparent to those of ordinary skill in the art that ACB gain could be limited to the specified range prior to 
storage for a subframe. As such, the processor stores a signal reflecting the ACB gain, whether pre- or post-limited to 

10 the specified range. Also, the exact value of the upper and lower limits are a matter of choice which may be varied to 
achieve desired results in any specific realization of the present invention. 

C. The Decoder 

is The encoder described above (and in the referenced sections of the G.729 Draft) provides a frame of data repre- 
senting compressed speech every 10 ms. The frame comprises 80 bits and is detailed in Tables 1 and 9 of the Q.729 
Draft. Each 80-bit frame of compressed speech is sent over a communication channel to a decoder which synthesizes 
a speech (representing two subframes) signals based on the frame produced by the encoder. The channel over which 
the frames are communicated (not shown) may be of any type (such as conventional telephone networks, cellular or 

20 wireless networks, ATM networks, etc.) and/or may comprise a storage medium (such as magnetic storage, semicon- 
ductor RAM or ROM, optical storage such as CD-ROM, etc.). 

An illustrative decoder in accordance with the present invention is presented in Figure 4. The decoder is much like 
the encoder of Figure 3 in that it includes both an adaptive codebook portion 240 and a fixed codebook portion 200. 
The decoder decodes transmitted parameters (see G.729 Draft Section 4.1) and performs synthesis to obtain recon- 

25 structed speech. 

The FCB portion includes a FCB 205 responsive to a FCB index, I, communicated to the decoder from the encoder. 
The FCB 205 generates a vector, c(n), of length equal to a subframe. See G.729 Draft Section 4.1.3. This vector; is 
applied to the PPF 210 of the decoder. The PPF 210 operates as described above (based on a value of ACB gain, g 
p . delayed in delay processor 225 and ACB pitch-period, M, both received from the encoder via the channel) to yield a 

30 vector for application to the FCB gain amplifier 235. The amplifier, which applies a gain, g c , from the channel, gener- 
ates a scaled version of the vector produced by the PPF 210. See G.729 Draft Section 4.1 .4. The output signal of the 
amplifier 235 is supplied to summer 255 which generates an excitation signal, u(n). 

Also provided to the summer 255 is the output signal generated by the ACB portion 240 of the decoder. The ACB 
portion 240 comprises the ACB 245 which generates an adaptive codebook contribution, v(n), of length equal to a sub- 

35 frame based on past excitation signals and the ACB pitch-period, M, received from encoder via the channel. See G.729 
Draft Section 4.1.2. This vector is scaled by amplifier 250 based on gain factor, g p received over the channel. This 
scaled vector is the output of ACB portion 240. 

The excitation signal, u(n), produced by summer 255 is applied to an LPC synthesis filter 260 which synthesizes a 
speech signal based on LPC coefficients, a if received over the channel. See G.729 Draft Section 4.1.6. 

40 Finally, the output of the LPC synthesis filter 260 is supplied to a post processor 265 which performs adaptive post- 
filtering (see G.729 Draft Sections 4.2.1 - 4.2.4), high-pass filtering (see G.729 Draft Section 4.2.5), and up-scaling 
(see G.729 Draft Section 4.2.5). 

II. Discussion 

45 

Although a number of specific embodiments of this invention have been shown and described herein, it is to be 
understood that these embodiments are merely illustrative of the many possible specific arrangements which can be 
devised in application of the principles of the invention. Numerous and varied other arrangements can be devised in 
accordance with these principles by those of ordinary skill in the art without departing from the scope of the invention. 

so For example, should scalar gain quantization be employed, the gain of the PPF may be adapted based on the cur- 
rent, rather than the previous, ACB gain. Also, the values of the limits on the PPF gain (0.2, 0.8) are merely illustrative. 
Other limits, such as 0.1 and 0.7 could suffice. 

In addition, although the illustrative embodiment of present invention refers to codebook "amplifiers. " it will be 
underst od by those of ordinary skill in the art that this term encompasses the scaling of digital signals. Moreover, such 

55 scaling may be accomplished with scale factors (or gains) which are less than or equal to one (including negative val- 
ues), as well as greater than one. 
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1 Introduction 

This Recommendation contains tae description of an algorithm for the coding of speech signals at 8 
kbit/s using Conjugate-Structui^Algebraic-Code-Excited Linear-Predictive (CS-ACELP) coding. 

This .oaer ts .l«i?r.ed to operate with a digital signal obtained by first performing telephone 
bandwidth filtering j ITU Rec.G.710) of the analog input signal, then sampling it at SOOO Hz. 
followed by cots version to 15 bit linear PCM for the input to the encoder. The output of the decoder 
should be convened back to an analog signal by similar means. Other input/output characteristics, 
such as those specified by CTU Rec. G.711 for 64 kbit/s PCM data, should be converted to 16 bit 
linear PCM b«for* «ocoding. or from 16 bit linear PCM to the appropriate format after decoding. 
The bust ream from the encoder to the decoder is defined within this standard. 

This Recommendation is organized as follows: Section 2 gives a general outline of the CS- 
ACELP algorithm la Sections 3 and 4. the CS-ACELP encoder and decoder principles are dis- 
cussed, respectively. Section 5 describes the software that defines this coder in 16 bit fixed point 
arithmetic. 
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2 Gen ral description of the coder 



The CS-ACELP coder is based on the code-excited linear- predictive (CELP) coding model. The 
coder operates on speech frames of 10 ms corresponding to 80 samples at a sampling rate of 8000 
samples/sec. For every 10 msec frame, the speech signal is analyzed to extract the parameters of 
the CELP model (LP filter coefficients, adaptive and fixed codebook indices and gains). These 
parameters are encoded and transmitted. The bit allocation of the coder parameters is shown in 
Table i. At the decoder, these parameters are used to retrieve the excitation and synthesis filter 



Table I: Bit allocation of the 8 kbit/s CS-ACELP algorithm (10 msec frame). 



Parameter 


Codeword 


Subframe I 


Subframe 2 |] Total per frame 


^LSP 


L0. 11, L2, L3 






18 


. Adaptive codebook delay 


Pi, ?2 


8 


5 


13 


: Delay parity 


P0 


1 




I 


Fixed codebook index 


Cl f C2 


13 


13 


26 


Fixed codebook sign 


SI. S2 


4 


4 


8 


Codebook gains (stage 1) 


GA1, GA2 


3 


3 


6 


Codebook gains (stage 2) 


GB1. GB2 


4 


4 


8 


Total 








80 



parameters. The speech is reconstructed by filtering this excitation through the LP synthesis filter, 
as is shown in Figure 1. The short-term synthesis filter is based on a 10th order linear prediction 
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Figure 1: Block diagram of conceptual CELP synthesis model. 

(LP) filter. The long-term, or pitch synthesis filter is implemented using the so-called adaptive 
codebook approach for delays less than the subframe length. After computing the reconstructed 
speech, it is further enhanced by a postfilter. 



16 



EP 0 749 110 A2 



Kroon 4 



2.1 Encoder 



The signal flow at tbe encoder is shown in Figure 2. The input signal is high-pass filtered and scaled 
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Figure 2: Signal flow at the CS-ACELP encoder. 
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m the pre-processing block. The pre-processed signal serves as the input signal for all subsequent 
analysis. LP analysis is done once per 10 ms frame to compute the LP filter coefficients. These 
coefficients are converted to line spectrum pairs (LSP) and quantized using predictive two-stage 
vector quantization ( VQ) with 18 bits. The excitation sequence is chosen by using an analysis- 
by-synthesis search procedure in which the error between the original and synthesized speech is 
minimized according to a perceptually weighted distortion measure. This is done by filtering the 
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error signal with a perceptual weighting filter, whose coefficients are derived from the unquantized 
LP filter. The amount of perceptual weighting is made adaptive to improve the performance for 
input signals with a flat frequency- response. 

The excitation parameters (fixed and adaptive codebook parameters) are determined per sub- 
frame of 5 ms (40 samples) each. The quantized and unquantized LP filter coefficients are used for 
the second subframe, while in the first subframe interpolated LP filter coefficients are used (both 
quantized and unquantized). An open-loop pitch delay is estimated once per 10 ms frame based 
on the perceptually weighted speech signal. Then the following operations are repeated for each 
subframe. The target signal x{n) is computed by filtering the LP residual through the weighted 
synthesis filter W(z)l A(z). The initial states of these filters are updated by filtering the error 
between LP residual and excitation. This is equivalent to the common approach of subtracting the 
zero- input response of the weighted synthesis filter from the weighted speech signal. The impulse 
response, h(n), of the weighted synthesis filter is computed. Closed-loop pitch analysis is then 
done (to find the adaptive codebook delay and gain), using the target r(n) and impulse response 
K(n), by searching around the value of the open-loop pitch delay. A fractional pitch delay with 1/3 
resolution is used. The pitch delay is encoded with 8 bits in the first subframe and differentially 
encoded with 5 bits in the second subframe. The target signal r(n) is updated by removing the 
adaptive codebook contribution (filtered adaptive codevector), and this new target, *?(n), is used 
in the fixed algebraic codebook search (to find the optimum excitation). An algebraic codebook 
with IT bits is used for the fixed coc*- u -^k excitation. The gains of the adaptive and fixed code- 
book are vector quantized with 7 bits, (with MA prediction applied to the fixed codebook gain). 
Finally, the filter memories are updated using the determined excitation signal. 

2.2 Decoder 

The signal flow at the decoder is shown in Figure 3. First, the parameters indices are extracted from 
the received bitstream. These indices are decoded to obtain the coder parameters corresponding 
to a 10 ros speech frame. These parameters are the LSP coefficients, the 2 fractional pitch delays, 
the 2 fixed codebook vectors, and the 2 seta of adaptive and fixed codebook gains. The LSP 
coefficients are interpolated and converted to LP filter coefficients for each subframe. Then, for 
each 40-sample subframe the following steps are done: 

• the excitation is constructed by adding the adaptive and fixed codebook vectors scaled by 
their respective gains, 
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Figure 3: Signal flow at the CS-ACELP decoder. 

• the speech is reconstructed by filtering the excitation through the LP synthesis Miter, 

• the reconstructed speech signal is passed through a post- processing stage, which comprises 
of an adaptive postfilter based on the long- term and short-term synthesis filters, followed by 
a high- pass filter and scaling operation. 



25 



2.3 Delay 



30 



This coder encodes speech and other audio signals with 10 ma frames. In addition, there is a 
look- ahead of 5 ms, resulting in a total algorithmic delay of 15 ms. Ail additional delays in a 
practi-al implementation of this cod*- **e due to: 



35 



• processing time needed for encoding and decoding operations, 

• transmission time on the communication link, 

• multiplexing delay when combining audio data with other data. 



40 



2.4 Speech coder description 



45 



The description of the speech coding algorithm of this Recommendation is made in terms of 
bit-exact, fixed-point mathematical operations. The ANSI C code indicated in Section 5. which 
constitutes an integral part of this Recommendation, reflects this bit-exact, fixed-point descriptive 
approach. The mathematical descriptions of the encoder (Section 3), and decoder (Section 4), can 
be implemented in several other fashions, possibly leading to a codec implementation not complying 
with this Recommendation. Therefore, the algorithm description of the C code of Section 5 shall 
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take precedence over the mathematical descriptions of Sections 3 and 4 whenever discrepancies are 
found. A non-exhaustive set of test sequences which can be used in conjunction with the C code 
are available from the ITU. 
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2.5 Notational conventions 

Throughout this document it is tried to maintain the following notational conventions. 

• Codebooks are denoted by caligraphic characters (e.g. C). 

• Time signals are denoted by the symbol and the sample time index between parenthesis (e.g. 
s(n)). The symbol n is used as sample instant index. 

• Superscript time indices (e.g ? (m) ) refer to that variable corresponding to subframe m, 

• Superscripts identify a particular element in a coefficient array. 

• A " identifies a quantized version of a parameter. 

• Range notations are done using square brackets, where the boundaries are included (e.g. 
[0.6.0.9J). 

• log denotes a logarithm with base 10. 

Table 2 lists che most relevant symbols used throughout this document. A glossary of the most 

Table 2: Glossary of symbols. 



Name 


Reference 


Description 


l/A{2) 


Eq. (1) 


LP synthesis niter 




Eq. (1) 


input high-pass filter 


BA') 


Eq. (77) 


pitch postnlter 


B,{*) 


Eq. (83) 


short-term postfUter 


B.(m) 


Eq. (83) 


tilt-compensation filter 


B**(>) 


Eq. (90) 


output high- pass filter 


P(i) 


Eq. (46) 


pitch filter 




Eq. (27) 


weighting filter 



SO 



relevant signals is given in Table 3. Table 4 summarises relevant variables and their dimension. 
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Constant parameters are listed in Table o. The acronyms used in this Recommendation are sum- 
marized in Table 6. 



Table 3: Glossary of signals. 



10 



15 



25 



.V'ome 


Description 


h{n) 


impulse response of weighting and synthesis alters 


rik) 


autocorrelation sequence 


r(k) 


modified auto-correlation sequence 


R(k) 


correlation sequence 


sw(n) 


weighted speech signal 


s{n) 


speech signal 


,'(n) 


windowed speech signal 


'f(n) 


post filtered output 


*/'(») 


gain-scaled post filtered output 


i(n) 


reconstructed speech signal 


r(n) 


residual signal 


x(n) 


target signal 


*i(n) 


second target signal 


v[n) 


adaptive code book contribution 


c{n) 


fixed codebook contribution 


9{n) 


v(n) • a(n) 


s(n) 




»(n) 


excitation to LP synthesis niter 


d(n) 


correlation between target signal and a(n) 


ew(n) 


error signal 
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10 



15 

Table 4: Glossary of variables. 
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iVame 


Size 


Description 


9w 


1 


adaptive code book gain 


9c 


1 


fixed code book gain 


90 


1 


modified gain for pitch poatfilter 


9 p*t 


1 


pitch gain for pitch poatfilter 


9t 


1 


gain term short-term poatfilter 


9t 


I 


gain term tilt poatfilter 


To, 


1 


open-loop pitch delay 


a. 


10 


LP coefficient* 


k. 


10 


reflection coefficient* 


o. 




LAR coefficiest* 




10 


LSF normalized frequencies 


7. 


10 


LSP coefficients 


r(fc) 


11 


correlation coefficients 




10 


LSP weighting coefficients 


/. 


10 


LSP quantizer output 
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Table 5: Glossary of constants. 



.Vame 


Value 


Description 


/, 


5000 


sampling frequency 


h 


60 


bandwidth expansion 


i\ 


0.94/0.98 


weight factor perceptual weighting niter 


yi 


0.60/[0.4-0.7] 


weight factor perceptual weighting filter 


In 


0.55 


weight factor post filter 


td 


0.70 


weight factor post filter 


t* 


0.50 


weight factor pitch post niter 


*»» 


0.90/0.2 


weight factor tilt post filter 


c 


Table 7 


fixed (algebraic) codebook 


CO 


Section 3.2.4 


moving average predictor codebook 


CI 


Section 3.2.4 


First stage LSP codebook 


C2 


Section 3.2.4 


Second stage LSP codebook (low part) 


£3 


Section 3.2.4 


Second stage LSP codebook (high part) 




Section 3.9 


First stage gain codebook 


Q8 


Section 3.9 


Second stage gain codebook 


vi« 9 


Eq. (6) 


correlation lag window 


m P 


Eq. (3) 


LPC analysis window 



Table 6: Glossary of acronyms. 



Acronym 


Description 


CELP 


code-excited linear* prediction 


MA 


moving average 


MSB 


most significant bit 


LP 


linear prediction 


LSP 


line spectral pair 


LSF 


line spectral frequency 


VQ 


vector quantisation 
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3 Functional description of the encoder 

In This 4*rttoa «*e d«cnb< the different functions of the encoder represented in the blocks of 
Figure I 



3.1 Pre-processing 

As Hated to Section t. the input to the speech encoder is assumed to be a L6 bit PCM signal. 
Two pre- processing functions are applied before the encoding process: 1) signal scaling, and 2) 
high-pass filtering 

The scaling consists of dividing the input by a factor 2 to reduce the possibility of overflows 
20 in the fixed- point implementation. The high-pass filter serves as a precaution against un desired 

low- frequency components. A second order pole/ zero filter with a cutoff frequency of 140 Hz is 
used. Both the scaling and high- pass filtering are combined by dividing the coefficients at the 
numerator of this filter by 2. The resulting filter is given by 

0.46363718- 0.92724705Z' 1 +0.46363718*-* ( 
ffaif)- l- 1.9059465z- l + 0.91l4024z- 2 1 ' 

The input signal filtered through Hm(z) is referred to as *(n), and will be used in ail subsequent 
coder operations. 



3.2 Linear prediction analysis and quantization 

The short-term analysts and synthesis filters are based on 10th order linear prediction (LP) filters. 
The LP synthesis filter is defined as 

where d,, t = I, . . 10, are the (quantized) linear prediction (LP) coefficients. Short-term predic- 
tion, or linear prediction analysis is performed once per speech frame using the autocorrelation 
approach with a 30 ms asymmetric window. Every 80 samples (10 ms), the autocorrelation coeffi- 
cients of windowed speech are computed and converted to the LP coefficients using the Levinson 
algorithm. Then the LP coefficients are transformed to the LSP domain for quantization and 
interpolation purposes. The interpolated quantized and un quantized filters are converted back to 
the LP filter coefficients (to construct the synthesis and weighting filters at each subframe). 
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3.2.1 Windowing and autocorrelation computation 

The LP analysis window consists of two parts: the first part is half a Hamming window and the 
second part is a quarter of a cosine function cycle. The window is given by: 

[ 0.54 -0.46 cos n=0 199. 

I cos ( isi )■ n = 200.. .,239. 

There is a 5 ms lookahead in the LP analysis which means that 40 samples are needed from the 
15 future speech frame. This translates into an extra delay of 5 ms at the encoder stage. The LP 

analysis window applies to 120 samples from past speech frames, 80 samples from the present 
speech frame, and 40 samples from the future frame. The windowing in LP analysis is illustrated 
in Figure 4. 
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Figure 4: Windowing in LP analysis. The different shading patterns identify corresponding exci- 
tation and LP analysis frames. 

The autocorrelation coefficients *.« windowed speech 

s'(n) s w tp (n) sin), n = 0 239, (4) 

are computed by 

239 

r(ft) = £ *'(n)»'(i» - ft), , ft = 0 10, (5) 

nmk 

To avoid arithmetic problems for low- level input signals the value of r(0) has a lower boundary of 
r(0) s 1.0. A 60 Hi bandwidth expansion is applied, by multiplying the autocorrelation coefficients 
with 

»,.,(*) ««p l-i(^) I , fc=l 10, (6) 



45 where / 0 = 60 Hz is the bandwidth expansion and /, = 8000 Hz is the sampling frequency. Further, 

r(0) is multiplied by the white noise correction factor 1.0001, which is equivalent to adding a noise 
floor at -40 dB. 
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3.2.2 Levinson-Durbin algorithm 

The modified autocorrelation coefficients 

r'(0)= l.OGOtr(O) 

r'(*)= r<*), k=l 10 (7) 

are used to obtain the LP filter coefficients a t . i = 1 10. by solving the set of equations 

10 

, 5 ]Ta,r'(|,-*|)=-r'(±). 10. (3) 

1 = 1 

The set of equations in (8) is solved using the Levinson-Durbin algorithm. This algorithm uses the 
following recursion: 

£70) = r'(0) 
for i s I to 10 

/or j = 1 ro 1 1 - I 
enrf 

£■(«) = ( 1 - k*)E{i - 1) . i/£T(t) < 0 <A«a £T(i) = 0.01 

The final solution is given u a, = aj l0, t ; = 1 10. 

35 3.2.3 LP to LSP conversion 

The LP filter coefficients a„ts 1 10 are converted to the line spectral pair (LSP) representa- 
tion for quantization and interpolation purposes. For a 10th order LP filter, the LSP coefficients 
are defined as the roots of the sum and difference polynomials 

F:(z) = .4(z)+z- u A(r- l ) t (9) 

and 

Fl(z) = A(z)~z- u A(z~ l ), (10) 

respectively. The polynomial Fl{z) is symmetric, and F^(z) is antisymmetric. It can be proven 
that all roots of these polynomials are on the unit circle and they alternate each other. F[(z) has 
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a root : = -! s t) and F+(xt has a root : s 1 s 0). To eliminate these two roots, we define 
5 the new polynomial* 

(U) 

and 

70 = (12) 

Each polynomial has S conjugate roots on the unit circle (e ±;w '), therefore, the polynomials can 
be written as 

Fi(:)= [J (i-2*r- l +r- 3 ) (13) 

15 i=l,3. .9 

and 

Fj(*>= II + (U) 



20 



25 



30 



35 



45 



i=2.4,. .10 



where q t = cosU,) with ^, bein* the line spectral frequencies (LSF) and they satisfy the ordering 
property 0 < w, < ^ < . . < w I0 < i. We refer to $ as the LSP coefficients in the cosine domain. 

Since both polynomials Fi(i) and F 2 {z) are symmetric only the first 5 coefficients of each 
polynomial need to be computed. The coefficients of these polynomials are found by the recursive 
relations 

/ K (i+L)= Oi+t + a l0 -, i = 0,-.. .4, 

/j(«+l) = -«io-i +/j(i), i = 0,...,4, (15) 

where /x(0) = /j(0) « 1.0. The LSP coefficients axe found by evaluating the polynomials Fi(z) 
and Ft(z) at 60 poinu equally spaced between 0 and r and checking for sign changes. A sign 
change signifies the existence of a root and the sign change interval is then divided 4 times to 
better track the root. The Chebyshev polynomials are used to evaluate F\(t) and In this 

method the roots axe found directly in the cosine domain {ft}. The polynomials fi(z) or F?(r), 
evaluated at ; = c J ^, can be written as 

40 F(uj) =2c-^C(r). (16) 

with 



C(z) = 7i(r) + /(l)r 4 (r) + /(2)T 3 (r) + /(3)T 2 (r) + /(4)T,{*) + /(5)/2. (17) 

where T m {z) = cos(mw) is the mth order Chebyshev polynomial, and /(i), i s I, ... ,5, are the 
coefficients of either F x {:) or Fi(z), computed using the equations in (15). The polynomial C(r) 
is evaluated at a certain value of x = cos(w) using the recursive relation: 
50 for k = 4 down to I 
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6 k = 2r6 t+l -6 t+2 + /(5-fc) 

end 

C(z) = zb x -6, + /(5)/2 
with initial values 6 5 = I and 6 5 = 0. 

3.2.4 Quantization of the LSP coefficients 

The LP filter coefficients are quantized using the LSP representation in the frequency domain: that 
15 is 

~/ t = arccos(ft). i=l 10, (18) 

where w, are the line spectral frequencies (LSF) in the normalized frequency domain [0, rj. A 
switched 4th order MA prediction is used to predict the current set of LSF coefficients. The 
difference between the computed and predicted set of coefficients is quantized using a two-stage 
vector quantizer. The first stage is a lG-dimensional VQ using codebook CI with 128 entries (7 
bus). The second stage is a 10 bit VQ which has been implemented as a split VQ using two 
vdimensionai codebooks, £2 and £3 containing 32 entries (5 bits) each. 
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To explain the quantization process, it is convenient to first describe the decoding process. 
Each coefficient is obtained from the sum of 2 codebooks: 

. / C- -) + £2i(L2) .,5, 
- \ (19) 
( n.(H) + £3 (i _ 5 >(L3) i = 6,..., 10, 

where Li. L2. and L3 are the codebook indices. To avoid sharp resonances in the quantized LP 
synthesis filters, the coefficients U are arranged such that adjacent coefficients have a minimum 
distance of J. The rearrangement routine is shown below: 



fori = 2, ...10 

•/(/.-I >U-J) 

li-i = (h +4.!- J)/2 

end 

end 



50 



This rearrangement process is executed twice. First with a value of J = 0.0001, then with a value 

of J = 0.000095. 

After this rearrangement process, the quantized LSF coefficients wj" 0 for the current frame n. 
are obtained from the weighted sum of previous quantizer outputs /( m -*\ and the current quantizer 
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output r m> 

-r-u-jXti-'+ixc-' 1 . »=» to. .so 

k=l k=L 

where mf are the coefficients of the switched MA predictor. Which MA predictor to use is defined 
by a separate bit L0. At startup the initial values of /J kl are given by /, = it/ 11 for all k < 0. 

After computing the corresponding filter is checked for stability. This is done as follows: 



I Order the coefficient in increasing value, 
75 2. If J x < 0.005 then J x = 0.005, 

3 If - < 0.0001. then J t>l = w, + 0.0001 i = 1 9, 

4. If -mo > 3 135 then _r 10 = 3.135. 



The procedure for encoding the LSF parameters can be outlined as follows. For each of .the 
two MA predictors the best approximation to the current LSF vector has to be found. The best 
approximation is defined as the one that minimizes a weighted mean-squared error 

10 

Elpc = £t*(wi -w f )'. (21) 

isl 

Th- weights wi are made adaptiv* »« a function of the unquantized LSF coefficients, 



{10 if * 7 - 0.04s- - 1 > 0, 

iO(w : - 0.04s* — I) 3 + I otherwise 

1 10(u 

( 10<-W 9 +< 



( iO(w : - 0.04T - 1)- + I < 

.,2<><9 = < - -1>0, (WJ 

10(w,+i — Wi-i — I)* + 1 otherwise 



if - w» + 0.92t - 1 > 0, 

">to = S 

* 092r — l) 2 + 1 otherwise 



40 In addition, the weights u> 5 and u>« are multiplied by 1.2 each. 

The vector to be quantised for the current frame is obtained from 

4 A 



C » p| m> -£"•?#""*'] /O -£»»?). » =1 10. (23) 

fcxl ksl 

The first codebook CI is searched and the entry Ll that minimizes the (unweighted) mean- 
squared error is selected. This is followed by a search of the second codebook £2, which defines 
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the lower part of the second stage. For each possible candidate, the partial vector J,, t = 1, 5 

5 is reconstructed using £q. (20), and rearranged to guarantee a minimum distance of 0.0001. The 

vector with index L2 which after addition to the first stage candidate and rearranging, approximates 
the lower part of the corresponding target best in the weighted MSE sense is selected. Using the 
selected first stage vector LI and the lower part of the second stage (12), the higher part of 

10 

the second stage is searched from codebook £3. Again the rearrangement procedure is used to 
guarantee a minimum distance of 0.0001. The vector L3 that minimizes the overall weighted MSE 
is selected. 

15 This process is done for each of the two MA predictors defined by £0, and the MA predictor 

L0 that produces the lowest weighted MSE is selected. 

20 3.2.5 Interpolation of the LSP coefficients 

The quantized (and unquantized) LP coefficients are used for the second subframe. For the ftrst 
subframe, the quantized (and unquantized) LP coefficients are obtained from linear interpolation 
25 of the corresponding parameters in the adjacent sub frames. The interpolation is done on the LSP 

coefficients in the q domain. Let q\ m) be the LSP coefficient* at the 2nd subframe of frame m, and 
q\ m ~ Xl the LSP coefficients at the 2nd subframe of the past frame (m - I). The (unquantized) 
interpolated LSP coefficients in each of the 2 subframes are given by 

30 

Subframe 1 : qU = 0.5^ m ~ l) + 0.5*J m \ i = I, . . . , 10, 

Subframe 2 : q2i s f< m) is 1 10. (24) 

The same interpolation procedure is used for the interpolation of the quantized LSP coefficients 

35 

by substituting by q> in Eq. (24). 
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3.2.6 LSP to LP 

Once the LSP coefficients are quantized and interpolated, they are converted back to LP coefficients 

{a,}. The conversion to the LP domain is done as follows. The coefficients of f\(z) and fj(z) are 

found by expanding Eqs. (13) and (14) knowing the quantized and interpolated LSP coefficients. 

The following recursive relation is used to compute fi(i), ial 5, from 

for i=l to 5 

/i(«> = -2**_i/i(i-i)+2A(«-2) 
for j = t — 1 downto 1 
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AO) = AO) - 293<-iAO — i) + AO - 2) 

end 

end 

with initial values /i(0) s 1 and /i(-l) = 0. The coefficients A(i) are computed similarly by 
replacing <j ?i _i by q^. 

Once the coefficients / v (i) and f 2 (i) are found, and F*(z) are multiplied by I + r~ l and 

I - respectively, to obtain F[(z) and Fj(z); that is 

/HO = A(0 + A(i-D. i = i 5, 

/5(0 - /a(0- /»(«*- 1), * = 1 5. (25) 

Finally ihe LP coefficients are found by 

/ 0.5/1(0+0.5/5(0. 
a. - < (26) 
\ 0.5/I(«-5)-0.5r a (t-5), « = 6,...,10. 

This is directly derived from the relation A{z) ss (F{(r) + Fi(z))/2, and because /"{(*) and FJ(r) 
are symmetric and antisymmetric polynomials, respectively. 



3.3 Perceptual weighting 

The perceptual weighting filter is based on the unquantized LP filter coefficients and is given by 

"M-A^-i + tg^-*- (27) 

35 The values of 7i and 73 determine the frequency response of the filter W(x). By proper adjustment 

of these variables it is possible to make the weighting more effective. This is accomplished by 
making 7t and 73 a function of the spectral shape of the input signal. This adaptation is done 
once per 10 ma frame, but an interpolation procedure for each first subframe is used to smooth 

40 this adaptation process. The spectral shape is obtained from a 2nd*order linear prediction filter, 

obtained as a by product from the Levinson-Durbin recursion (Section 3.2.2). The reflection 
coefficients k it are converted to Log Area Ratio (LAR) coefficient* <h by 



. (1.0 + hi) . , n 

These LAR coefficients are used for the second subframe. The LAR coefficients for the first 
subframe are obtained through linear interpolation with the LAR parameters from the previous 
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frame, and are given by: 

Sub frame 1 : ol. = 0.5o; m " l) O.ooj m ', i = 1 2, 

Sub frame 2: o2i a oJ m> . is 1 2. (29) 

The spectral envelope is characterized as being either flat (flat = 1) or tilted (//a* =0). For each 
subframe this characterization is obtained by applying a threshold function to the LAR coefficients. 
To avoid rapid changes, a hysteresis is used by taking into account the value of flat in the previous 
subframe (m - 1), 



0 if oi < -1.74 and o 7 > 0.65 and flat^' l) = I, 

//at< m > = < 1 if ox > -1.52 and < 0.43 and flat^'^ = 0, (30) 

//afC—U otherwise. 

20 If the interpolated spectrum for a subframe is drained as flat (/fo*( m > = 1), the weight factors 

are set to 71 = 0.94 and 73 = 0.6. If the spectrum is classified as tilted (//<U< m) = 0), the value 
of 7 t is set to 0.98, and the value of 73 is adapted to the strength of the resonances in the LP 
synthesis Alter, but is bounded between 0.4 and 0.7. If a strong resonance is present, the value 

25 0 f 72 ij doser to the upperbound. This adaptation is achieved by a criterion based on the 

minimum distance between 2 successive LSP coefficients for the current subframe. The minimum 
distance is given by 

dntin — min[w 1 > l - <*] i ss 1 9. (3i) 

The following linear relation is used to compute 73: 

73 =s -6.0 * d^un + 1.0, and OA < 73 < 0.7 (32) 



The weighted speech signal in a subframe is given by 
10 10 



su;(n) = *(») + £ *-rl«(* - 0 - £ <Hj\*w(n - i). n = 0 39. (33) 

•»1 iml 

40 The weighted speech signal *w(n) is used to And an estimation of the pitch delay in the speech 

frame. 



45 3.4 Open-loop pitch analysis 

To reduce the complexity of the search for the best adaptive codebook delay, the search range is 
limited around a candidate delay Top, obtained from an open-loop pitch analysis. This open-loop 
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pitch analysis is done once per frame (10 ms). The open-loop pitch estimation uses the weighted 
speech signal sw(n) of Eq. (33). and is done as follows: In the first step, 3 maximaof the correlation 



= £ 3w(n)sw(n - k) 

10 are found in the following three ranges 



nsO 



(34) 



» = 1 : 80 143, 

i= 2: 40,. ..,79, 
«= 3: 20,... ,39. 

The retained maxima R(t t ) , i = 1 3, are normalized through 



20 sw 3 (n - U)' 

The winner among the three normalised correlations is selected by favoring the delays with the 
values in the lower range. This is done by weighting the normalised correlations corresponding' to 
the longer delays. The best open-loop delay Top is determined as follows: 
Top =*i 

«//^(<a)>0.85ir(r^) 
end 

*ff?(t 3 )>0MK(Tov) 
KiTop) = JT(ts) 

Top = *3 

35 end 

This procedure of dividing the delay range into 3 sections and favoring the lower sections is 
used to avoid choosing pitch multiples. 



3.5 Computation of the impulse response 

The impulse response, A(n), of the weighted synthesis filter W(z)/A(z) is computed for each 
sub frame. This impulse response is needed for the search of adaptive and fixed codebooks. The 
impulse response n(n) is computed by filtering the vector of coefficients of the filter A{i/yi) 
extended by zeros through the two filters l/i4(*) and l/A(zfn). 
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3.6 Computation of the target signal 

The target signal x(n) for the adaptive codebook search is usually computed by subtracting the 
zero-input response of the weighted synthesis filter W(z)/A(z) =s A{zh i )/[A(z)A(z/y J )] from the 
weighted speech signal sw(n) of Eq. (33). This is done on a subframe basis. 

An equivalent procedure for computing the target signal, which is used in this Recommendation, 
is the filtering of the LP residual signal r(n) through the combination of synthesis filter i/A(z) 
and the weighting filter A(z/y x )/A(z/j2). After determining the excitation for the subframe, the 
initial states of these filters are updated by filtering the difference between the LP residual and 
excitation. The memory update of these filters is explained in Section 3.10. 

The residual signal r(n), which is needed for rinding the target vector is also used in the adaptive 
codebook search to extend the past excitation buffer. This simplifies the adaptive codebook search 
procedure for delays less than the subframe site of 40 as will be explained in the next section. The 
LP residual is given by 

10 

r(n) = s(n) + £ * J ( n ~ ■ >■ 0, .... 39. (36) 

tol 

3.7 Adaptive-codebook search 

The adaptive-codebook parameters ^or pitch parameters) are the delay and gain. In the adaptive 
codebook approach for implementing the pitch filter, the excitation is repeated for delays less than 
the subframe length. In the search stage, the excitation is extended by the LP residual to simplify 
the closed-loop search. The adaptive-codebook search is done every (5 ms) subframe. In the first 
subframe, a fractional pitch deUy 7\ is used with a resolution of 1/3 in the range (19$, 84§] and 
integers only in the range [85, 143]. For the second subframe, a delay T» with a resolution of 1/3 
is always used in the range [(«nt)Tt " (™*)Tx + 4 Jj, where (inOTi is the nearest integer to 
the fractional pitch delay I\ of the first subframe. This range is adapted for the cases where T% 
straddles the boundaries of the delay range. 

For each subframe the optimal delay is determined using closed-loop analysis that minimises 
the weighted mean-squared error. In the first subframe the delay Ti is found be searching a small 
range (6 samples) of delay values around the open-loop delay T„ (see Section 3.4). The search 
boundaries r min and t mM are denned by 

ttnin = — 3 
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</<*un ■< 20 then tmin = 20 
U«t = ^m»» + 6 

iftmt* > 143 *aen 
< w = 143 

«nd 

For the second subframc, closed-loop pitch analysis is done around the pitch selected in the first 
subframe to find the optimal delay Tj. The search boundaries are between t^n - f and t max + }. 
where t mtn and r m<LV are derived from T t as follows: 

75 t min s (tnt)Ti - 5 

•/Urn <20 <aen U in =20 

i/tm«* > 143 faea 
U« = 143 

20 U.n = *fw - 9 

end 

The dosed- loop pitch search minimises the mean-squared weighted error between the original 
25 and synthesized speech. This is achieved by maximising the term 

where x(n) is the target signal an** " J is the past filtered excitation at delay k (past excitation 
convolved with h(n)). Note that the search range is limited around a preselected value, which is 
the open-loop pitch for the first subframe, and T% for the second subframe. 

The convolution is computed for the delay tmin, and for the other integer delays in the 
search range k = t m xn + i, . . . , tmu. it is updated using the recursive relation 
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- - 1) + «<-*)M»). » * 3* o, (3a) 

where u(n), n = —143, - . .,39, is the excitation buffer, and y»~i(~l) = 0. Note that in the search 
stage, the samples u(n), n = 0, . . . , 39 are not known, and they are needed for pitch delays less 
than 40. To simplify the search, the LP residual is copied to u(n) to make the relation in Eq. (38) 
valid for all delays. 

For the determination of T* t and 7\ if the optimum integer closed-loop delay is leas than 84. 
the fractions around the optimum integer delay have to be tested. The fractional pitch search 
is done by interpolating the normalized correlation in Eq. (37) and searching for its maximum. 
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The interpolation ts done using a FIR filter 6 12 based on a Hamming windowed sine function with 
5 i he «ne truncated at ill and padded with zeros at ±12 (&i:(12) s 0). The filter has its cut-off 

frequency (-3dB) ax 3600 Hs in the oversampled domain. The interpolated values of R(k) for the 
fractions -|. 0. J. and J are obtained using the interpolation formula 
s 3 

70 = Rik " 4)6i:( * + , 3) + 51 R{k + 1 + ' 1 + t '- 3 )« * = 0, 1. 2, (39) 

• ad i=0 

where t s 0. 1.2 corresponds to the fractions 0. and §, respectively. Note that it is necessary 
to compute correlation terms in Eq. (37) using a range t min - 4, f ma * + 4, to allow for the proper 
15 interpolation 



3.7.1 Generation of the adaptive codebook vector 

20 

Once the ooa integer pitch delay has been determined, the adaptive codebook vector v(n) ia com- 
puted by interpolating the past excitation signal u(n) at the given integer delay k and fraction 
t 

9 9 
v(n) = £«(»-* + ,)*3o(l + i.3) + 21 «(»-*+ l + 0*3o(3-< + i.3) f n = 0,...,39, t = 0, l f 2. 

isU isO 

(40) 

The interpolation filter frjo is baaed on a Hamming windowed sine functions with the sine truncated 
30 at ±29 and padded with zeros at ±30 (6so(30) = 0). The niters has a cut-off frequency (-3 dB) at 

3600 Hz in the oversampled domain. 
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3.7.2 Codeword computation for adaptive codebook delays 

The pitch delay 7\ is encoded with 8 bits in the first snbframe and the relative delay in the second 
subframe is encoded with 5 bits. A fractional delay T is represented by its integer part (int)T, 
and a fractional part /roc/3, /roc = -1,0, 1. The pitch index PI is now encoded as 



PI 



(({int)Ti - l») • 3 + /roe - 1, if 7\ = (19 85), /roc = [-1,0, 1] 
«mt)7\ - 85) + 197, if 7\ = [86, .... 143], frae = 0 



45 The value of the pitch delay T 2 is encoded relative to the value of 7\. Using the same interpre- 

tation as before, the fractional delay 7" 3 represented by its integer part (inr)!*, and a fractional 
part frae/3. frae = —1.0, 1. is encoded as 



PI = ((int)T, - t min ) * 3 + frae + 2 (42) 
36 



55 



31 

3NSDOCID: <EP 07491 10A2J_> 



EP 0 749 110 A2 



Kroon 4 

where t mtn is derived from Ti as before. 

To make the coder more robust against random bit errors, a parity bit P0 is computed on the 
delay index of the first subframe. The parity bit is generated through an XOR operation on the 
6 most significant bits of PI. At the decoder this parity bit is recomputed and if the recomputed 
value does not agree with the transmitted value, an error concealment procedure is applied. 



3.7.3 Computation of the adaptive- codebook gain 



Once the ad apt ive-codebook delay is determined, the adaptive-codebook gain g p is computed as 
*> = £^*J?? W i ' bounded by 0 < g p < 1.2, (43) 

where y(n) is the filtered adaptive codebook vector (zero-state response of W(*)/A(r) to *(n)). 
This vector is obtained by convolving v(n) with h(n) 

y(n) = X>(t)n(«-t) n^O^.,39. (44) 

tsO 

Vote that by maximizing the term in Eq. (37) in most cases g p > 0. In case the signal contains 
only negative correlations, the value of g p is set to 0. 



3.8 Fixed codebook: strv re and search 

The fixed codebook is based on an algebraic codebook structure using an interleaved single-pulse 
permutation (ISPP) design. In this codebook, each codebook vector contains 4 non-zero pulses. 
Each pulse can have either the amplitudes or - 1 , and can assume the positions given in Table 7. 



The codebook vector c(n) is constructed by taking a zero vector, and putting the 4 unit pulses 
at the found locations, multiplied with their corresponding sign. 

e(n)si0j(n- i0) + si 6(n - il) + s2 S(n - i2) + j3 6(n - *3), n = 0, . . . , 39. (45) 

where 6(0) is a unit pulse. A special feature incorporated in the codebook is that the selected code- 
book vector is filtered through an adaptive pre* filter P(z) which enhances harmonic components 
to improve the synthesized speech quality. Here the filter 

P(r) = l/(l-rfr- r ) (46) 
37 
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Table 7: Structure of fixed codebook C. 



Pulac 


Sign 


Positions 


iO 


sO 


0, 5. 10, 15, 20, 25, 30, 35 


il 


si 


1, 6, U, 16.21. 26, 31,36 


i2 


s2 


2. 7, 12, 17. 22, 27. 32. 37 


13 


s3 


3. 8, 13, 18. 23. 28. 33, 38 

4, 9. 14. 19, 24. 29, 34, 39 



is used, where T is the integer component of the pitch delay of the current subframe, and 3 is a 
pitch gain. The value of d is made adaptive by using the quantized adaptive codebook gain from 
the previous subframe bounded by 0.2 and 0.8. 



£3 o\2<£<0.8. 



(47) 



This filter enhances the harmonic structure for delays leas than the subframe size of 40. Thin 
modification is incorporated in the fixed codebook search by modifying the impulse response n(n) t 
according to 

h(n) = h(n) + 0h(n - T), « = T t ..,39. (48) 
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3.8.1 Fixed-code book search procedure 
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The fixed codebook is searched by minimizing the mean-squared error between the weighted input 
speech sw(n) of Eq. (33), and the weighted reconstructed speech. The target signal used in the 
closed-loop pitch search is updated by subtracting the adaptive codebook contribution. That is 



rt(n) = x(n) - g,y(n), n a 0, . . , 39, 
where y(n) is the filtered adaptive codebook vector of Eq. (44). 



(49) 



The matrix H is denned as the lower triangular Toepliz convolution matrix with diagonal h(0) 
and lower diagonals a(1), . . ., n(39). If c k is the algebraic codevector at index c, then the codebook 
is searched by maximizing the term 



Cl _ (£f a0 <*(n)c»(n)) 2 



(50) 



where d(n) is the correlation between the target signal *j(n) and the impulse response n(n), and 
# = H'H is the matrix of correlations of h(n). The signal d(n) and the matrix # are computed 
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before the codebook search. The elements of d{n) arc computed from 
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d(n) = £r(i]A(i-n), « =0,...,39, (51) 



and the H^meota cf the symmetric matrix * are computed by 



oii -J) = H A (n - iWn-j), (j > ,). (S2) 



Note thu only the elements actually needed are computed and an efficient storage procedure 
75 has been designed to speed up the search procedure. 

The algebra* structure of the codebook C allows for a fast search procedure since the codebook 
vector c» contains only four nonzero pulses. The correlation in the numerator of Cq. (SO) for a 
go given vector e» is given by 

3 

C = X>dOTii), (53) 

where m, is the position of the ith pulse and a, is ita amplitude. The energy in the denominator 
2$ of Eq. (50) is given by 

3 3 3 

£•=: 2%(m,,rn,) + 2£ (54) 

i=0 is0;=»+l 

30 Tr simplify the search procedure the pulse amplitudes are predetermined by quantizing the 

signal d{n). This b done by setting the amplitude of a pulse at a certain position equal to the 
sign of d(n) at that position. Before the codebook search, the following steps are done. First, the 
signal d(n) is decomposed into two signals: the absolute signal <f (n) = |d(n)| and the sign signal 
sign(4(n)]> Second, the matrix # is modified by including the sign information; that is, 

*'(•« J) = «gn [4Q] sign(rf(i)] j) t isO 39, jmi 39. (55) 

To remove the factor 1 in Eq. (54) 

i) = 0.5*(i, i), i - 0, .... 39. (56) 
The correlation in Eq. (53) is now given by 

C s <f (mo) + <f( mi ) + <f (m, ) + <f(m 3 ), (57) 
and the energy in Eq. (54) is given by 
E s tf'(mo, mo) 

39 
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+ ^'(mi.mO + $'(mo, rn t ) 
+ o'(m 7l m 2 ) + 0*{mo, m 3 ) + o'(nn , m 2 ) 

+ <*'(m 3 ,m 3 ) + tf'(m 0 ,m 3 ) + o'(m t , m 3 ) + *'(m 2l m 3 ). (58) 

A focused search approach is used to further simplify the search procedure. In this approach a 
p recomputed threshold is tested before entering the Last loop, and the loop is entered only if this 
threshold is exceeded. The maximum number of times the loop can be entered is fixed so that a 
low percentage of the codebook is searched. The threshold is computed based on the correlation 
C. The maximum absolute correlation and the average correlation due to the contribution of the 
first three pulses. max 3 and av 3 , are found before the codebook search. The threshold is given by 

t/tra = o»3 + A' 3 (max 3 - av 3 ). (59) 

The fourth loop is entered only if the absolute correlation (due to three pulses) exceeds tnr 3t where 
0 < AT 3 < I. The value of /f 3 controls the percentage of codebook search and it is set here to 0.4. 
Note that this results in a variable search time, and to further control the search the number of 
times the last loop is entered (for the 2 aubframea) cannot exceed a certain maximum, which is set 
here to 180 (the average worst case per subframe is 90 times). 

3.8.2 Codeword computation of the fixed codebook 

The pulse positions of the pulses i0.il, and i2. are encoded with 3 bits each, while the position of 
1 3 is encoded with 4 bits. Each pulse amplitude is encoded with 1 bit. This gives a total of 17 bits 
for the 4 pulses. By denning # = I if the sign is positive and s = 0 is the sign is negative, the sign 
codeword is obtained from 

S = *0 + 2*<l + 4#*2 + 8« j3 (60) 
and the fixed codebook codeword is obtained from 

C = (tO/5) + 8 • (il/5) + 64 • (t2/5) + 512 ♦ (2 • (i3/5) + jx) (61) 
where jz m 0 if iZ ss 3, 8, .., and jz s 1 if iZ = 4, 9. ... 

3.9 Quantization of the gains 

The adapt ive-codebook gain (pitch gain) and the fixed (algebraic) codebook gain are vector quan- 
tized using 7 bits. The gain codebook search is done by minimizing the mean-squared weighted 
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error between original and reconstructed speech which is given by 

E = x'x + y'y'y + glz'z - 2s p x'y - a^x'z + 2^y c y^, (62) 

where x is the target vector (see Section 3.6), y is the filtered adaptive codebook vector of Eq. (44), 
and * is the fixed codebook vector convolved with n(n), 

= J]<(«)M»-«1 n = 0,....39. (63) 

3.9.1 Gain prediction 

The fixed codebook gain g e can be expressed as 

9c = 7& (64) 
where ^ is a predicted gain based on previous fixed codebook energies, and 7 is a correction factor. 

The mean energy of the fixed codebook contribution is given by 

£=10 log ^5 f>?). (65) 

After scaling the vector c, with the fixed codebook gain g c , the energy of the scaled fixed codebook 
is given by 20 logy, + £. Let £T (m) be the mean-removed energy (in dB) of the (scaled) fixed 
codebook contribution at subframe m, given by 

£< m > = 20 log + (66) 

where E = 30 dB is the mean energy of the fixed codebook excitation. The gain g c can be expressed 
as a function of E {m \ £, and E by 

g c = io<* m> +*- fi >/ w . (67) 

The predicted gain is found by predicting the log-energy of the current fixed codebook 
contribution from the log-energy of previous fixed codebook contributions. The 4th order MA 
prediction is done as follows. The predicted energy is given by 

& m * m^kiBfi—**, (68) 

where [b x b 7 b 2 b<) = [0.68 0.58 0.34 0.19) are the MA prediction coefficients, and & m) is the 
quantized version of the prediction error at subframe m< defined by 

# m > = Et mi - E* m| . (69) 
41 
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The predicted gain g $ c is found by replacing E [m) by its predicted value in Eq (67). 

g' c = i 0 <S ( -'+*-*VW (70) 

The correction factor 7 « related to the gain-prediction error by 
to & m > = £<"•> - t m) = 20 log(r). (71) 
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3.9.2 Code book search for gain quantisation 

The adaptive-codebook gain, g pf and the factor 7 are vector quantized using a 2-stage conjugate 
structured code book. The first stage consists of a 3 bit two-dimensional codebook QA, and the 
second stage consists of a 4 bit two-dimensional codebook QB. The first element in each codebook 
represents the quantized adaptive codebook gain g p , and the second element represents the quan- 
tized fixed codebook gain correction factor 7. Given codebook indices m and n for QA and QB, 
respectively, the quantized adaptive-codebook gain is given by 

*>=wUi(m)+g0t(n) f (72) 

and the quantized fixed-codebook gain by 

g c = g' e ) SJ ; {GAt(m) + £5,(n)). (73) 

This conjugate structure simplifies the codebook search, by applying a pre-selection process. 
The optimum pitch gain g,, and fixed-codebook gain, a e , are derived from Eq. (02), and are used for 
the pre-selection. The codebook QA contains 8 entries in which the second element (corresponding 
to g c ) has in general larger values than the first element (corresponding to g p ). This bias allows 
a pre-selection using the value of In this pre-selection process, a cluster of 4 vectors whose 
second element are close to o*«, where gx e is derived from g c and g f . Similarly, the codebook 
QB contains 16 entries in which have a bias towards the first element (corresponding to g p ). A 
cluster of 8 vectors whose first elements are close to g p are selected. Hence for each codebook 
the best 50 % candidate vectors are selected. This is followed by an exhaustive search over the 
remaining 4 * 8 = 32 possibilities, such that the combination of the two indices minimizes the 
weighted mean-squared error of Eq. (62). 
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3.9.3 Codeword computation f r gain quantiser 

The codewords GA and GB for the gain quantizer are obtained from the indices corresponding to 
the best choice. To reduce the impact of single bit errors the codebook indices are mapped. 

3.10 Memory update 

An update of the states of the synthesis and weighting filters is needed to compute the target signal 
in the next subframe. After the two gains are quantized, the excitation signal, u(n), in the present 
subframe is found by 

u(n) = g p v(n) + g c c(n), n = Q 39, (74) 

where g 9 and g e are the quantized adaptive and fixed codebook gains, respectively, v(n) the adaptive 
codebook vector (interpolated past excitation), and c(n) is the fixed codebook vector (algebraic 
codevector including pitch sharpening). The states of the niters can be updated by filtering -the 
signal r(n) - u(n) (difference between residual and excitation) through the filters i/A(z) and 
•4U/7i)/.4(z/?2) for the 40 sample subframe and saving the states of the filters. This would 
require 3 filter operations. A simpler approach, which requires only one filtering is as follows. 
The local synthesis speech, i(n), is computed by filtering the excitation signal through l/A(z). 
The output of the filter due to the input r(n) — u(n) is equivalent to e(n) s ${n) — s(n). So the 
state* jf the synthesis filter l/A(z) jiven by e(n), n s 30, - - - , 39. Updating the states of the 
filter A(:/fi )/A(z/~ri) can be done by filtering the error signal e(n) through this filter to find the 
perceptually weighted error eto(n). However, the signal tw(n) can be equivalently found by 

ew(n) = x(n) - g p y(n) + g c *(n). (75) 

Since the signals r(n), y(n), and x(n) are available, the states of the weighting filter are updated 
by computing eur(n) as in Eq. (75) for n = 30 39. This saves two filter operations. 

3.11 Encoder and Decoder initialization 

All static encoder variables should be initialized to 0, except the variables listed in table 8. These 
variables need to be initialized for the decoder as well. 
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Table 8: De scription of parameters with nonzero in itialisation 



Variable 


Re /entice 


Initial value 




Section 3.4 


0.6 


li 


Section 3.2.4 


I>/11 


H 


Section 3.2.4 


0.9595, ... 




Section 3.9.1 


-14 
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4 Functional description of the dec der 

The signal flow a* the decoder was shown in Section 2 (Figure 3). First the parameters are decoded 
(LP coefficients, adaptive codebook vector, fixed codeboolt vector, and gains). These decoded 
parameters are used to compute the reconstructed speech signal. This process is described in 
Section 4.1. This reconstructed signal is enhanced by a post-processing operation consisting of a 
postftlter and a high-pass filter (Section 4.2). Section 4.3 describes the error concealment procedure 
used when either a parity error has occurred, or when the frame erasure flag has been set. 

4.1 Parameter decoding procedure 

The transmitted parameters are listed in Table 9. At startup all static encoder variables should be 
Table 9: Description of transmitted parameters indices. The bitstream ordering is reflected by the 



Symbol 


Description 


Bit* 


L0 


Switched predictor iadex of LSP quantiser 


I 


LI 


Fust stage vector of LSP quantiser 


7 


L2 


Second stage tower vector of LSP quantiser 


5 


L3 


Second higher vector of LSP quantiser 


5 


PI 


Pitch delay In subframe 


a 


P0 


Parity bit for pitch 


i 


SI 


Signs of pulses 1st subframe 


4 


CI 


Fixed codeboos 1st subframe 


13 


GM 


Gain codebook (stage 1) Jst subframe 


3 


GB1 


Gain codebook (stage 2) 1st subframe 


4 


P2 


Pitch delay 2nd subframe 


5 


S3 


Signs of poises 2nd subframe 


4 


CI 


Fixed codebook 2nd subframe 


13 


GA2 


Gain codebook (stage 1) 2nd subframe 


I 


GB2 


Gain codebook (stage 2) 2nd subframe 


4 



initialized to 0, except the variables listed in Table 8. The decoding process is done in the following 

order: 
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4.1.1 Decoding of LP filter parameters 

The received indices LO, LI, L2, and L3 of the LSP quantizer are used to reconstruct the quan- 
tized LSP coefficients using the procedure described in Section 3.2.4. The interpolation procedure 
described in Section 3.2.5 is used to obtain 2 interpolated LSP vectors (corresponding to 2 sub- 
frames). For each subframe, the interpolated LSP vector is converted to LP filter coefficients n it 
which are used for synthesizing the reconstructed speech in the subframe. 

The following steps are repeated for each subframe: 

1. decoding of the adaptive codebook vector, 

2. decoding of the fixed codebook vector, 

3. decoding of the adaptive and fixed codebook gains, 

4. computation of the reconstructed speech, 

4.1.2 Decoding of the adaptive codebook vector 

The received adaptive codebook index is used to rind the integer and fractional parts of the pitch 
delay. The integer part (in<)7\ and fractional part frac of Ti are obtained from PI as follows: 



i/Pl < 197 

(int)T x = (Pi+2)/3 + 19 
frac = PI - (mt)Tt*3 + 5S 

the 

35 (int)Ti = PI - U2 

frac = 0 

end 



The integer and fractional part of 7? are obtained from P2 and t min , where t min is derived 
from PI as follows 



tmin = {int)Ti - 5 

if tmin < 20 then t min = 20 

f max = tmin +" 9 

45 . tft mas > 143 then 

tmaz = 143 
tmin = t mas — 9 

end 
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How Ta is obtained from 

{int)T 7 = (P2+2)/3 -i + 
frac s W-2-((m2)/3.ir3 



The adaptive codebook vector v(n) is found by interpolating the past excitation u(n) (at the 
pitch delay) using Eq. (40). 



4.1.3 Decoding of the fixed codebook vector 

The received fixed codebook index C is used to extract the positions of the excitation pulse*. The 
pulse signs are obtained from 5. Once the pulse positions and signs are decoded the fixed codebook 
vector e(n), can be constructed. If the integer part of the pitch delay, T, is less than the subframe 
size 40. the pitch enhancement procedure is applied which modifies c(n) according to Eq. (48). 

4.1.4 Decoding of the adaptive and fixed codebook gains 

25 The received gain codebook index gives the adaptive codebook gain g p and the fixed codebook 

gain correction factor y. This procedure is described in detail in Section 3.0. The estimated fixed 
codebook gain tf c is found using Eq. (70). The fixed codebook vector is obtained from the product 
of the quantized gain correction factor with this predicted gain (Eq. (64)). The adaptive codebook 

30 gain is reconstructed using Eq. (72). 
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4.1.5 Computation of the parity bit 

Before the speech is reconstructed, the parity bit is recomputed from the adaptive codebook delay 
(Section 3.7.2). If this bit is not identical to the transmitted parity bit PQ, it is likely that bit 
errors occurred during transmission and the error concealment procedure of Section 4.3 is used. 



4.1.6 Computing the reconstructed speech 

The excitation u(n) at the input of the synthesis filter (see Eq. (74)) is input to the LP synthesis 
filter. The reconstructed speech for the subframe is given by 

i(n) = u(n) » = 0 39. (76) 

• 8l 
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where d* are the interpolated LP filter coefficients. 

The reconstructed speech s(n) is then processed by a post processor which is described in the 
next section. 

4.2 Post-processing 

Post-processing consists of three functions: adaptive postfUtering, high-pass filtering, and signal 
up-scaling. The adaptive postfilter is the cascade of three filters: a pitch postfilter H p (z) % a 
short-term postfilter ff/(x), and a tilt compensation filter H t (z), followed by an adaptive gain 
control procedure. The postfilter is updated every subframe of 5 ms. The postfUtering process 
is organised as follows. First, the synthesis speech i(n) is inverse filtered through M z /Tn) to 
produce the residual signal r(n). The signal r(n) is used to compute the pitch delay T and gain 
g ptt . The signal r(n) is filtered through the pitch postfilter H p (z) to produce the signal r'(n) which, 
in iu turn, is filtered by the synthesis filter l/[9/M x /Td)]- Finally, the signal at the output" of 
the synthesis filter l/lg/M*/7d)) is passed to the tilt compensation filter H t (z) resulting in the 
postfiltered synthesis speech signal #/(n). Adaptive gain controie is then applied between s/(n) 
and i(n) resulting in the signal */'(»). The high-pass filtering and scaling operation operate on 
the postfiltered signal 

4.2.X Pitch postfilter 

The pitch, or harmonic, postfilter is given by 



where g pit is the pitch gain. Both the pitch delay and gain are determined from the decoder output 
signal. Note that g^t is bounded by i, and it is set to xero if the pitch prediction gain is less that 
3 dB. The factor ?, controls the amount of harmonic postfUtering and has the value y p = 0 5. The 
pitch delay and gain are computed from the residual signal r(n) obtained by filtering the speech 
s(n) through A(:/y n ), which is the numerator of the short-term postfilter (see Section 4.2.2) 




(77) 



where T is the pitch delay and go is a gain factor given by 



9o = yptpit* 



(78) 



10 



r(n) = i(n) + £ ifaHn - i). 



(79) 
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The pitch delay is computed using a two pass procedure. The first pass selects the best integer To 
in the range, (Ti — i,Z\ + I], where Z\ is the integer part of the (transmitted) pitch delay in the 
first subframe. The best integer delay is the one thai maximises the correlation 

39 

R{k)= £r(n)f(»-fc). (SO) 

nsO 

The second pass chooses the best fractional delay T with resolution 1/8 around Tq, This is done 
by finding the delay with the highest normalized correlation, 

g W - g?rt . (81) 

where r k (n) is the residual signal at delay k. Once the optimal delay T is found, the corresponding 
correlation value is compared against a threshold. If &(T) < 0.5 then the harmonic postftlter is 
disabled by setting g pit = 0. Otherwise the value of g^t is computed from: 

9pU = I^ftffla^ , bounded by 0 < < 1.0. (82) 

The noninteger delayed signal r fc (n) is first computed using an interpolation filter of length 33. 
After the selection of T, r k (n) is recomputed with a longer interpolation filter of length 129. The 
25 new signal replaces the previous one only if the longer filter increases the value of K(T). 
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4.2.2 Short-term postftlter 

The short-term postftlter is given by 



35 where A(z) is the received quantized LP inverse filter (LP analysis is not done at the decoder). 

and the factors y n and 74 control the amount of short-term postfiltering, and are set to 7n = 0.35, 
and 74 = 0.7. The gain term g f is calculated on the truncated impulse response, ft/(n), of the 
filter A(z/7n)/M*hd) and given by 

*/=£>/(»)!. (") 

nmQ 



4.2.3 Tilt compensation 

Finally, the filter H t (t) compensates for the tilt in the short-term postfilter Hj(z) and is given by 

tf r (*) = -L(l+T.±t;- 1 ). (85) 
9t 
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where is a tilt factor, k x being the first reflection coefficient calculated on h f (n) with 

5 r>i(l) l9wl 

* l = ~nJQ) ; r * (l) ~ 5- Mj>*/0" + 0. (86) 

The gain term g t = 1 - |-r,fci| compensates for the decreasing effect of g t in #/(*). Furthermore. 
10 it has been shown that the product filter Hj{z)H,(z) has generally no gain. 

Two values for y t are used depending on the sign of k t . If k x is negative, y t = 0.9, and if k x is 
positive. 7i = 0.2. 
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4.2.4 Adaptive gain control 

Adaptive gain control is used to compensate for gain differences between the reconstructed speech 
signal i(n) and the postfiltered signal s/(n). The gain scaling factor G for the present subframe 
is computed by 

The gain-scaled postfiltered signal $f'[n) is given by 



sf(n) = g(n)sf(n) t n = 0 39, (88) 

where g(n) is updated on a sample-by-sample basis and given by 

g(n) = 0.85*(n- l) + 0.15G, n = 0,...,39. (89) 
The initial value of y( — 1) s 1.0. 

4.2.5 High- pass filtering and up* scaling 

40 A high-pass filter at a cutoff frequency of 100 Hz is applied to the reconstructed and postfiltered 

speech sf{n). The filter is given by 

« / v. 0.93980581 - 1.8795834*- 1 +0.9398058 1*-* 

- i - 1.9330735*- 1 + 0.93589199*-* [ ' 



Up-scaling consists of multiplying the high-pass filtered output by a factor 2 to retrieve the 
input signal level. 



50 



3NSDOCID: <EP 07491 10A2 I > 



45 



EP 0 749 110 A2 



Kroon 4 

4.3 C ncealm at of frame erasures and parity errors 

An error coocttlmtot procedure has been incorporated in the decoder to reduce the degradations 
in the reconstructed speech because of frame erasures or random errors in the biutream. This error 
concealment process is functional when either i) the frame of coder parameters (corresponding to 
a 10 mi frame) has been identified as being erased, or ii) a checksum error occurs on the parity 
bit for the pitch delay index PI. The latter could occur when the bitstream has been corrupted 
by random bit errors 

If a parity error occurs on PI . the delay value 7\ is set to the value of the delay of the previous 
frame. The value of Tj ts derived with the procedure outlined in Section 4.1.2, using this new value 
of 7\. If consecutive parity errors occur, the previous value of 7\, incremented by I, is used. 

The mechanism for detecting frame erasures is not defined in the Recommendation , and will 
depend on the application. The concealment strategy has to reconstruct the current frame, based 
on previously r e cei ved information. The method used replaces the missing excitation signal with 
one of similar characteristics, while gradually decaying its energy. This is done by using a voicing 
classifier based on the long-term prediction gain, which is computed as part of the long-term 
postnlter analysis. The pitch postfUter (see Section 4.2.1) finds the long-term predictor for which 
the prediction gam is more than 3 dB. This is done by setting a threshold of 0.5 on the normalised 
correlation /?(*) (Eq (81)). For the error concealment process, these frames will be classified as 
periodic. Otherwise the frame is declared nonpehodic. An erased frame inherits its class from 
the preceding (reconstructed) speech frame. Note that the voicing classification is continuously 
updated based on this reconstructed speech signal. Hence, for many consecutive erased frames the 
classification might change. Typically, this only happens if the original cl a ss ificatio n was periodic. 

The specific steps taken for an erased frame are: 

1. repetition of the LP filter parameters, 

2. attenuation of adaptive and fixed codebook gains, 

3. attenuation of the memory of the gain predictor, 

4. generation of the replacement excitation. 
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4.3.1 Repetition of LP filter parameters 

The LP parameters of the last good frame are used. The states of the LSF predictor contain the 
values of the received codewords Since the current codeword is aot available it is computed 
from the repeated LSF parameters and the predictor memory from 

= - £ m?4 m -*'] /(I - £ m?). f = 1 10. (91) 

ksl *sl 

4.3.2 Attenuation of adaptive aad fixed codebook gains 

An attenuated version of the previous fixed codebook gain is used. 

si m >=0.98ai' n - l) . (92) 

The same is done for the adaptive codebook gain. In addition a dipping operation is used to keep 
its value below 0.9. 

= 0.9*< m - l) and g< m) < 0.9. (93) 

4.3.3 Attenuation of the memory of the gain predictor 

The s-in predictor uses the ener^ * previously selected codebooks. To allow for a smooth 
continuation of the coder once good frames are received, the memory of the gain predictor is 
updated with an attenuated version of the codebook energy. The value of H< m) for the current 
subframe n is set to the averaged quantised gain prediction error, attenuated by 4 dB. 

4 

# m > s (0.25 £ - 4;0 and f6 m) > -14. (94) 

i-i 

4.3.4 Generation of the replacement excitation 

The excitation used depends on the periodicity classification. If the last correctly received frame 
was classified as periodic, the current frame is considered to be periodic as well. In that case only 
the adaptive codebook is used, and the fixed codebook contribution is set to zero. The pitch delay 
is based on the last correctly received pitch delay and is repeated for each successive frame. To 
avoid excessive periodicity the delay is increased by one for each next subframe but bounded by 
143. The adaptive codebook gain is based on an attenuated value according to Eq. (93). 
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If the last correctly received frame *m classified as nonpeiriodic, the current frame is considered 
5 to be no n periodic as well, sad the adaptive codcbook contribution is set to zero. The fixed codebook 

contribution is generated by randomly selecting a codebook index and sign index. The random 
generator is based on the function 

io seed = seed • 31821 + 13849, (95) 

with the initial seed value of 21845. The random codebook index is derived from the 13 least 
significant bits of the next random number. The random sign is derived from the 4 least significant 
/5 bits of the next random number. The fixed codebook gain is attenuated according to Eq. (92). 
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5 Bit-exact d scription of the CS-ACELP coder 

ANSI C code simulating the CS-ACELP coder in 16 bit fixed-point is available from tTU-T. The 
following sections summarize the use of this simulation code, and how the software is organized. 

5.1 Use of the simulation software 

The C code consists of two main programs coder .c, which simulates the encoder, and decoder . c, 
which simulates the decoder. The encoder is run as follows: 

coder input file bstreasrf ile 
The inputfile and outputftle are sampled data files containing 16-bit PCM signals. The bitstream 
file contains 81 16-bit words, where the first word can be used to indicate frame erasure, and the 
remaining 80 words contain one bit each. The decoder takes this bitstream file and produces a 
postfiltered output file containing a 10-bit PCM signal, 
decoder bstreanxlle output* He 

5.2 Organization of the simulation software 

In the fixed-point ANSI C simulation, only two types of fixed-point data are used as is shown in 
Table 10. To facilitate the implementation of the simulation code, loop indices, Boolean values and 



Table 10: Data types used in ANSI C simulation. 



Type 


Mas. vakmm 


M im. vaimm 




Wordl6 
WordM 


0x70 
0x7ffiflKL 


0x8000 
OxSOOOOOOOL 


signed 2's complement 10 bit word 
signed 2's complement 32 fait word 



flags use the type nag, which would be either 16 bit or 32 bits depending on the target platform. 

All the computations are done using a predefined set of basic operators. The description of 
these operators is given in Table 11. The tables used by the simulation coder are summarized in 
Table 12. These main programs use a library of routines that are summarised in Tables 13, 14, 

and 15. 
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Table 11: Baste operations used in ANSI C simulation. 
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OpcroUon 


Description 


Vordl6 sature(Uord32 L.rarl) 


Limit to 16 bit* 


VordU add(VerdU tart. VordU ear2) 


Short addition 


VordU sub(VordU rarl. Vordl6 rir2) 


Short subtraction 


VordU *bs_s (VordU earl) 


Short aba 


VordU shl (VordU eurl. VordU rar2) 


Short shift left 


VordU shr(Vordl6 nrl, VordU rar2) 


Short shift right 


VordU «nlt (VordU rarl. Hordl6 rar2) 


Short moitipbxstion 


Vord32 L.ault (VordlS earl. VordU e«r2) 


Long multtpticatioa 


Verdi 6 a«g*t« (VordU rexi) 


Short negate 


VordU extract _h(VortU2 l.rarl) 


Extract high 


Word 16 extract_l(Vord32 l.rarl) 


Extract low 


VordU round (Vord32 L.rarl) 


Round 


Verd32 L.aac(Vord32 L.rar3, VordU earl. VordU e*r2) 


Mac 


Vord32 L_a»u(Vord32 L.ear3. VordU earl, VordU nrl) 


Mam 


Vord32 L,a*c«o(Vord32 L.ear3. VordU tarl, VordU eart) 


Mac withoet eat 


Vord32 L.aeuXe(Vord32 L_rar3. Word 10 earl, Herd 16 eart) 


Men withoet aat 


Vord32 L_add(Vord32 L.rarl. Vord32 L_ear2) 


Long addition 


Vord32 l.sub(Vord32 L.rarl, Vord32 L.rar2) 


Long subtraction 


VortU2 L.add.c(Word32 L.rarl. Vord32 L.rart) 


Long add with c 


VordJ2 L.sub.c(VordJ2 l.ear*. ,_JJ2 L.rar2) 


Long sab with c 


Vord32 L.nagate(Vord32 L.rarl) 


Long negate 


Vordl6 «alt.r( VordU rmrl. VordU ear2) 


Mohipocataoo with roand 


Word32 L.ahl(Vord32 L.rarl, VordU ear2) 


Long shift left 


Vord32 L.abx<Vord3? L.rarl. VordU ear2) 


Long shift right 


VordU ebx.r (VordU rarl. VordlS rar2) 


Shift right with round 


VordU aac.r(Vord32 L.rmr3, VordU rarl. VordU rar2) 


Mac with rounding 


VordU eao_r(Vonl32 L_rar3. VordU rarl. VordU rar2) 


Msn with rounding 


Vord32 L.dopooit Jx(Vordl« rarl) 


16 bit reel -I MSB 


Vord32 L.dopoait.HVordU rarl) 


16 bit rarl -j, LSB 


Vord32 L_«hr_r(Vord32 L.rarl. VordU rar2) 


Long shift right with round 


Vord32 L_*be(Vord32 L.rarl) 


Long abs 


Vord32 L.sat(Vord32 L.rarl) 


Long saturation 


VordU aom.sCVordU rarl) 


Short norm 


VordU dir.* (VordU rarl. VordlS rar2) 


Short diriaioa 


VordU nora.l(Vord32 L.rarl) 


Long norm 
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Table 12: Summary of tables. 



10 


File 


Tabic name 


Size 


Description 




tab.hup.c 


tab. hup. s 


28 


upsempung niter for postfilter 




tab.hup.c 


tab.hup.l 


112 


upsampliag filter for pottfilter 




inter.3 . c 


later.) 


13 


FIR filter for interpolating the correlation 


15 


pred.lt3.c 


intsr.3 


31 


FIR filter for interpolating past excitation 




lspcb.tab 


Upcbl 


128x10 


LSP quantiser (first stage) 




lspcb.tab 


lspcb2 


32x10 


LSP quantiser (second stage) 




Ispcb . tab 


*% 


2x 4 xlO 


MA predictors in LSP VQ 


20 


lspcb.tab 


fg.e« 


2x10 


used in LSP VQ 




lspcb.tab 


fg.sua.inv 


2x10 


used in LSP VQ 






gbitl 


8x2 


codebook GA in gain VQ 






gbk2 


16x2 


eodebook GB in rrnin VQ 


25 


qua .gain. tab 


aapl 


8 


used in gain VQ 




qua.jain.tab 


iaapl 


8 


used in gain VQ 




qua.gain.tab 


aap2 


Iff 


used in gain VQ 




qua .gain. tab 


ina21 


18 


used in gain VQ 


30 


wi~ioe.tab 


window 


240 


LP analysis window 




lac wind, tab 


iag.h 


10 


lag window for bandwidth expansion (high part) 




laf.vind.tab 




10 


lag window for bandwidth expansion (low part) 




grid. tab 


grid 


SI 


grid points in LP to LSP conversion 


35 


inv.sqrt .tab 


table 


49 


lookup tank in inverse sqaare root computation 




log2.tab 


table 


33 


lookup table in base 2 logarithm computation 




lap.lsf .tab 


table 


65 


lookup table in LSF to LSP conversion and vice versa 




lan.lsf .tab 


slope 


64 


Use slopes in LSP to LSF conversion 


40 


pov2.tab 


table 


33 


lookup table in 2* computation 




acelp.lt 






prototypes for fixed codebook search 




IdSk.h 






prototypes and constants 


45 


typedef.h 






type definitions 
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Table 13: Summary of encoder specific routines. 



Filename 


Description 


acelp.co.c 


Search fixed codebook 


autocorr.c 


Compute autocorrelation for LP analysis 


as.lsp.c 


compote LSPs from LP coefficient* 


cod_ld8k.e 


encoder routine 


convolve., c 


convolution operation 


corr_xy2.c 


compote correlation terms for gain quantisation 


•nc_lag3.c 


encode adaptive codebook index 


g.pitch.c 


compute adaptive codebook gain 


gainpred.c 


gain predictor 


int_lpc.c 


interpolation of LSP 


inter.3.c 


fractional delay interpolation 


lag_vind.c 


lag- windowing 


levinson.c 


lerinsoa recursion 


Ispenc . c 


LSP encoding routine 


Upgetq.c 


LSP quantiser 


lspg#tt .c 


compute LSP quantiser distortion 


l*Pg«t V . c 


compute LSP weights 


laplast . c 


select Lor' MA predictor 


lsppre.c 


pre**ekctaoa first LSP codebook 


lspprav.c 


LSP predictor routines 


lspsell.c 


tint stage LSP quantiser 


l*ps«12.c 


second stage LSP quantiser 


lspstab.c 


stabtfity test for LSP quantiser 


pitcn.fr. c 


closed-loop pitch search 


pitcn_ol.c 


open-loop pitch search 


pre _proc • c 


pre-processing (HP filtering and scaling) 


pvf .c 


computation of perceptual weighting coefficients 


qua^gaia.c 


gain quantiser 


qun_lsp.c 


LSP quantiser 


relapse . c 


LSP quantiser 
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Table 14: Summary of decoder specific routines. 



10 



15 



Filename 


Peremption 


d.lsp.c 


decode LP information 


de.aeelp.c 


decode algebraic codebook 


d«c_g*in.c 


decode gains 


dec.LagJ.c 


decode adaptive codebook index 


dec.ldSk.c 


decoder routine 


lspdec.c 


LSP decoding rontine 


pont.pro.c 


poet processing (HP filtering and testing) 


pred.lt3.c 


feneration of adaptive codebook 


pst.c 


poet&lter routines 



20 



Table 15: Summary of general routines. 



30 



35 



40 



45 



Filename 


Description 


bteicopa.c 




bita.c 


bit n»*nipaUtioa romtuea 


gai&pred.c 


gain predictor 


int.lpc.c 


taterpoUtioa of LSP 


inter _3.c 


fractional delay interpolation 


lepras. e 


compete LP front LSP coefficient* 


lap.laf.c 


convexson between LSP and LSF 


lsp.laf2.c 


high precision conversion between LSP and LSF 


lspexp.c 


erpasrioa of LSP coomaenta 


Ispetan.c 


stability teat for LSP quantiser 


parity, c 


compete patch parity 


pred.ltl.c 


generation of adaptive codebook 


randon.c 


random generator 


rsaldn.c 


compute residual signal 


syn.filt.c 


synthesis filter 


veight.a.c 


bandwidth expansion LP coefficients 
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1. A method for use in a speech processing system which includes a first portion comprising an adaptive codebook 
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and corresponding adaptive codebook amplifier and a second portion corrprising a fixed codebook coupled to a 
pitch filter, the pitch filter comprising a delay memory coupled to a pitch filter amplifier, the method comprising: 

determining the pitch filter gain based on a measure of periodicity of a speech signal; and 
5 amplifying samples of a signal in said pitch filter based on said determined pitch filter gain. 

2. The method of claim 1 wherein the adaptive codebook gain is delayed for one subframe. 

3. The method of claim 1 where the signal reflecting the adaptive codebook gain is delayed in time. 

10 

4. The method of claim 1 wherein the signal reflecting the adaptive codebook gain comprises values which are 
greater than or equal to a lower limit and less than or equal to an upper limit. 

5. The method of claim 1 wherein the speech signal comprises a speech signal being encoded. 

75 

6. The method of claim 1 wherein the speech signal comprises a speech signal being synthesized. 

7. A speech processing system comprising: 

20 a first portion including an adaptive codebook and 

means for applying an adaptive codebook gain, and 

a second portion including a fixed codebook, a pitch filter, wherein the pitch filter includes a means for applying 
a pitch filter gain, 

25 and wherein the improvement comprises: 

means for determining said pitch filter gain, based on a measure of periodicity of a speech signal. 

8. The speech processing system of claim 7 wherein the signal reflecting the adaptive codebook gain is delayed for 
30 one subframe. 

9. The speech processing system of claim 7 wherein the pitch filter gain equals a delayed adaptive codebook gain. 

10. The speech processing of claim 7 wherein the pitch filter gain is limited to a range of values greater than or equal 
35 to 0.2 and less than or equal to 0.8 and, within said range, comprises a delayed adaptive codebook gain. 

11. The speech processing system of claim 7 wherein the signal reflecting the adaptive codebook gain is limited to a 
range of values greater than or equal to 0.2 and less than or equal to 0.8 and, within said range, comprises an 
adaptive codebook gain. 

40 

12. The speech processing system of claim 7 wherein said first and second portions generate first and second output 
signals and wherein the system further comprises: 

means for summing the first and second output signals, and 
45 a linear prediction filter, coupled the means for summing, for generating a speech signal in response to the 

summed first and second signals. 

1 3. The speech processing system of claim 1 2 further comprising a post filter for filtering said speech signal generated 
by said linear prediction filter. 

50 

14. The speech processing system of claim 7 wherein the speech processing system is used in a speech encoder. 

15. The speech processing system of claim 7 wherein the speech processing system is used in a speech decoder. 

55 16. The speech processing system of claim 5 wherein the means for determining comprises a memory for delaying a 
signal reflecting the adaptive codebook gain used in said first portion. 

17. A method for determining a gain of a pitch filter for use in a speech processing system, the system including a first 
portion comprising an adaptive codebook and corresponding adaptive codebook amplifier and a second portion 
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comprising a fixed codebook coupled to a pitch filter, the pitch filter comprising a delay memory coupled to a pitch 
filter amplifier for applying said determined gain, the speech processing system for processing a speech signal, the 
method comprising: 

5 determining the pitch filter gain based on periodicity of the speech signal. 

18. A method for use in a speech processing system which includes a first portion which comprises an adaptive code- 
book and corresponding adaptive codebook amplifier and a second portion which comprises a fixed codebook cou- 
pled to a pitch filter, the pitch filter conprising a delay memory coupled to a pitch filter amplifier, the method 

10 comprising: 

delaying the adaptive codebook gain; 

determining the pitch filter gain to be equal to the delayed adaptive codebook gain, except when the adaptive 
codebook gain is either less than 0.2 or greater than O.8., in which cases the pitch filter gain is set equal to 0.2 
15 or 0.8, respectively; and 

amplifying samples of a signal in said pitch filter based on said determined pitch filter gain. 

19. A speech processing system comprising: 

20 a first portion including an adaptive codebook and means for applying an adaptive codebook gain, and 

a second portion including a fixed codebook, a pitch filter, means for applying a second gain, wherein the pitch 
filter includes a means for applying a pitch filter gain, 

and wherein the improvement comprises: 

25 

means for determining said pitch filter gain, said means for determining including means for setting the pitch 
filter gain equal to an adaptive codebook gain, said signal gain is either less than 0.2 or greater than 0.8. , in 
which cases the pitch filter gain is set equal to 0.2 or 0.8, respectively. 
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(54) Adaptive codebook-based speech compression system 



(57) A speech coding system employing an adap- 
tive codebook model of periodicity is augmented with a 
pitch-predictive filter (PPF). This PPF has a delay equal 
to the integer component of the pitch-period and a gain 
which is adaptive based on a measure of periodicity of 
the speech signal. In accordance with an embodiment 
of the present invention, speech processing systems 
which include a first portion comprising an adaptive 
codebook and corresponding adaptive codebook ampli- 
fier and a second portion comprising a fixed codebook 
coupled to a pitch filter, are adapted to delay the adap- 
tive codebook gain; determine the pitch fitter gain based 
on the delayed adaptive codebook gain, and amplify 
samples of a signal in the pitch filter based on said 
determined pitch filter gain. The adaptive codebook 
gain is delayed for one subframe. The pitch filter gain 
equals the delayed adaptive codebook gain, except 
when the adaptive codebook gain is either less than 0.2 
or greater than 0.8. . in which cases the pitch filter gain 
is set equal to 0.2 or 0.8, respectively. 
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